Data Strategy https://tigosoftware.com/ en What is Data Gravity? https://tigosoftware.com/what-data-gravity <span class="field field--name-title field--type-string field--label-hidden">What is Data Gravity?</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span> <span class="field field--name-created field--type-created field--label-hidden">Sun, 03/20/2022 - 18:52</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h2><b>What is </b><b>Data Gravity</b><b>?</b></h2> <p>When working with larger and larger datasets, moving the data around to various applications becomes <strong>cumbersome </strong>and <strong>expensive</strong>. This effect is known as data gravity.</p> <p>The term data gravity was first coined by Dave McCrory, a software engineer, in trying to explain the idea that large masses of data exert a gravitational pull on IT systems. In physics, natural law says that objects with sufficient mass will pull objects with less mass towards them. This principle is why the moon orbits around the earth, and the earth revolves around the sun.</p> <p>Consider Data as if it were a Planet or other object with sufficient mass.  As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet.  As the mass or density increases, so does the strength of gravitational pull.  As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.  Relating this analogy to Data is what is pictured below.</p> <p><img src="https://i.imgur.com/VSTdQ41.png" /></p> <p>Note:  Latency and Throughput apply equally to both Applications and Services</p> <h2><strong>Why does data gravity cause problem?</strong></h2> <p>Data doesn’t literally create a gravitational pull, but smaller applications and other bodies of data seem to gather around large data masses. As data sets and applications associated with these masses continue to grow larger, it becomes increasingly difficult to move. This creates the <strong>data gravity problem.</strong></p> <p>Data gravity hinders an enterprise’s ability to be nimble or innovative whenever it becomes severe enough to lock you into a single cloud provider or an on-premises data center. To overcome the consequences of data gravity, organizations are looking to data services that simultaneously connect to multiple clouds.  </p> <p>How does this all relate back to <a href="http://www.database.com/" target="_blank">Database.com</a>?  If <a href="http://www.salesforce.com/" target="_blank">Salesforce.com</a> can build a new Data Mass that is general purpose, but still close in locality to its other Data Masses and App/Service Properties, it will be able to grow its business and customer base that much more quickly.  It also enables VMforce to store data outside of the construct of ForceDB (Salesforce’s core database) enabling knew Adjacent Services with persistence.</p> <p><img src="https://i.imgur.com/Ru1w0gA.png" /></p> <p>The analogy holds with the comparison of <a href="http://www.exploratorium.edu/ronh/weight/index.html" target="_blank">your weight being different on one planet vs. another planet</a> to that of services and applications (compute) having different weights depending on Data Gravity and what Data Mass(es) they are associated with.</p> <h2><b>Data Gravity</b><b>, Storage, and Cloud Computing</b></h2> <p>Duplicated data, outside of backups or DR strategies, is wasteful, so maintaining a single big data repository or data lake is the best method to avoid siloed and disparate datasets.</p> <p>Rather than using a data warehouse, which requires conformity in data, a data lake with appropriate security can handle your raw data and content from multiple data sources.</p> <p>A data lake with cost-effective scalability seems easy enough, and it can be — depending on the data needs at enterprises. Many organizations have a suitable on-premises data lake, but accessing that data lake from the cloud has several challenges:</p> <ul><li aria-level="1">Latency – The further you are from your cloud, the more latent your experience will be. For every doubling in round trip time (RTT), per-flow throughput is halved. This can cause a greater likelihood of slowdown, especially for data-intensive data analytics that leverages artificial intelligence and machine learning.</li> <li aria-level="1">Connectivity – Ordering and managing dedicated network links, such as AWS Direct Connect or Google Cloud dedicated Interconnect, can be costly. Balancing redundancy, performance, and operational costs is difficult.</li> <li aria-level="1">Support – Operating and maintaining storage systems is generally expensive and complicated enough to require dedicated expert personnel.</li> <li aria-level="1">Capacity – A location and infrastructure plan and budget for growth are required.</li> </ul><p>On-premises data lakes can address latency by co-locating closer to public cloud locations and by purchasing direct network connections. Still, the cost is prohibitive for midsized companies who wish to leverage the innovative services of multiple clouds. </p> </div> <div class="field field--name-field-blog-category field--type-entity-reference field--label-inline clearfix"> <div class="field__label">Category</div> <div class="field__item"><a href="/taxonomy/term/47" hreflang="en">IT Consulting</a></div> </div> <div class="field field--name-field-tags field--type-entity-reference field--label-inline clearfix"> <h3 class="field__label inline">Tags</h3> <ul class="links field__items"> <li><a href="/taxonomy/term/79" hreflang="en">Big Data</a></li> <li><a href="/taxonomy/term/229" hreflang="en">Data Strategy</a></li> <li><a href="/taxonomy/term/46" hreflang="en">data analytics</a></li> </ul> </div> <section class="field field--name-comment field--type-comment field--label-above comment-wrapper"> </section> Sun, 20 Mar 2022 11:52:35 +0000 admin 1115 at https://tigosoftware.com https://tigosoftware.com/what-data-gravity#comments Brief definition of Data Auditing, Data Curation, Data Stewardship and Data Governance https://tigosoftware.com/brief-definition-data-auditing-data-curation-data-stewardship-and-data-governance <span class="field field--name-title field--type-string field--label-hidden">Brief definition of Data Auditing, Data Curation, Data Stewardship and Data Governance </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span> <span class="field field--name-created field--type-created field--label-hidden">Wed, 03/09/2022 - 21:51</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h2><strong>What is Data Auditing?</strong></h2> <p>Data auditing is the assessment of data for quality throughout its lifecycle to ensure its accuracy and efficacy for specific usage.</p> <p>Data auditing is the process of conducting a data audit to assess how company's data is fit for given purpose. This involves profiling the data and assessing the impact of poor quality data on the organization's performance and profits</p> <h2><strong>What is Data Curation?</strong></h2> <p>Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information.</p> <p>Data curation is an end-to-end process of preparing and managing data so business users can easily understand and readily use it. It is the skill of selecting and bringing together relevant data into structured, searchable data assets that are ready for analysis.</p> <p>The ultimate goal of data curation is to reduce the time from data to insights. With the growing amount of data in organizations today, data curation is becoming essential. Without it, business users can neither locate useful data nor use it to its maximum potential.</p> <h2><strong>What is Data Governance?</strong></h2> <p>Data governance (DG) is <b>the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage</b>. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused.</p> <p><img src="https://i.imgur.com/dKLpnmw.png" /></p> <h2><strong>What is Data Stewardship?</strong></h2> <p>A data steward is <b>an oversight or data governance role within an organization</b>, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets.</p> <p>Data stewardship is not necessarily an information technology function, nor should it necessarily be considered to be a full-time position, although its proper execution deserves an appropriate reward. Data stewardship is a role that has a set of responsibilities along with accountability to the line-of-business management.</p> <p><img src="https://i.imgur.com/R92YVQU.png" /><br />  </p> <p> </p> </div> <div class="field field--name-field-blog-category field--type-entity-reference field--label-inline clearfix"> <div class="field__label">Category</div> <div class="field__item"><a href="/taxonomy/term/9" hreflang="en">Cloud – big data</a></div> </div> <div class="field field--name-field-tags field--type-entity-reference field--label-inline clearfix"> <h3 class="field__label inline">Tags</h3> <ul class="links field__items"> <li><a href="/taxonomy/term/79" hreflang="en">Big Data</a></li> <li><a href="/taxonomy/term/229" hreflang="en">Data Strategy</a></li> </ul> </div> <section class="field field--name-comment field--type-comment field--label-above comment-wrapper"> </section> Wed, 09 Mar 2022 14:51:30 +0000 admin 1081 at https://tigosoftware.com https://tigosoftware.com/brief-definition-data-auditing-data-curation-data-stewardship-and-data-governance#comments What is a Data Pipeline? https://tigosoftware.com/what-data-pipeline <span class="field field--name-title field--type-string field--label-hidden">What is a Data Pipeline?</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span> <span class="field field--name-created field--type-created field--label-hidden">Mon, 12/20/2021 - 23:23</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h2>Data Pipeline Definition</h2> <p>A data pipeline is the series of automated, consecutive data processing steps involved in ingesting and moving raw data from disparate sources to a destination. Data pipeline software facilitates the seamless, automated flow of data from one system to another, with common steps including: aggregating, augmentation, data transformation, enrichment, filtering, grouping, and running algorithms. A common data pipeline example is the etl data pipeline.</p> <p><img src="https://i.imgur.com/5jcNP7S.png" /></p> <h2>What is a Data Pipeline?</h2> <p>Data pipeline is a broad term referring to the chain of processes involved in the movement of data from one or more systems to the next. Data pipeline tools and software enable the smooth, efficient flow of data; automate processes such as loading data, processing data, extraction, transformation, validation, and combining; protect data integrity; and prevent bottlenecks and latency.</p> <p>The steps that occur in between the data sources and the destination depend entirely on the use case. A simple data pipeline may only include a single, static data source, extraction, loading, and a single data warehouse.</p> <p>A complex data ingestion pipeline may involve ingesting and processing multiple data streams in parallel, real-time data sources, transformation, training datasets for machine learning, a visual analytics destination, and multiple pipelines that in turn feed into other pipelines or applications. The best data pipeline solution depends on the nature of the project and business objectives.</p> <h2>Data Pipeline Solutions</h2> <p>A well managed data pipeline infrastructure is a crucial element in data science and data analytics. Data flow is susceptible to disruption, and useful analysis is dependent upon data reaching its intended destination uncorrupted and in a timely manner. An optimized data processing pipeline facilitates reliable delivery of data sets that are centralized, structured, and accessible to data scientists for further analysis. </p> <p><img src="https://i.imgur.com/Acl6pai.png" /></p> <p>Some of the most popular data pipeline architectures include: </p> <ul role="list"><li><strong>Batch Processing: </strong>Batch data pipeline processing is ideal for work that does not require real-time data. Large amounts of data are moved at consistent intervals. </li> <li><strong>Real-Time: </strong>Also known as stream processing, a real-time pipeline is ideal for streaming data that is being created in real-time.  </li> <li><strong>Cloud Native: </strong>A cloud native pipeline is hosted in the cloud and works with cloud-based data, relaying on the hosting vendor’s infrastructure. This is particularly useful for time-sensitive business intelligence applications.</li> <li><strong>Open Source: </strong>Open-source is best for teams that are looking to lower upfront costs and also have data engineers with the technological expertise to develop and modify the public tools available in an open-source pipeline.</li> </ul><h2>Building a Data Pipeline</h2> <p>Some companies have thousands of different data analysis pipelines running concurrently at any given moment. But the most basic steps for building data pipelines typically include, but are not always limited to: identifying data sources; extraction and joining data from disparate sources; data categorization; standardization; data cleansing and filtering; loading data into the destination; and automating the process so that it runs continuously and on schedule.</p> <p>Data pipeline monitoring tools should be integrated into the architecture to preserve data integrity and alert administrators of failures such as network congestion or an offline destination.</p> <h2>What is a Big Data Pipeline?</h2> <p>Data pipelines in big data are pipelines developed to accommodate the volume, velocity, and variety of big data. Big data analysis pipelines often have a stream process architecture that is scalable, can capture and process data in real-time, and can recognize both structured and unstructured data formats.</p> <p>An example of big data pipelines is interactions on social media. A single post on a social media platform could generate a series of pipelines branching off into multiple other pipelines, such as a sentiment analysis application, a word map chart application, and a social media mentions counting report.</p> <p>Source: <a href="https://www.omnisci.com/technical-glossary/data-pipeline">omnisci</a></p> </div> <div class="field field--name-field-blog-category field--type-entity-reference field--label-inline clearfix"> <div class="field__label">Category</div> <div class="field__item"><a href="/taxonomy/term/9" hreflang="en">Cloud – big data</a></div> </div> <div class="field field--name-field-tags field--type-entity-reference field--label-inline clearfix"> <h3 class="field__label inline">Tags</h3> <ul class="links field__items"> <li><a href="/taxonomy/term/229" hreflang="en">Data Strategy</a></li> <li><a href="/taxonomy/term/79" hreflang="en">Big Data</a></li> </ul> </div> <section class="field field--name-comment field--type-comment field--label-above comment-wrapper"> </section> Mon, 20 Dec 2021 16:23:48 +0000 admin 858 at https://tigosoftware.com https://tigosoftware.com/what-data-pipeline#comments DATA SILOS: THE ACHILLES HEEL OF LARGE ORGANIZATIONS https://tigosoftware.com/data-silos-achilles-heel-large-organizations <span class="field field--name-title field--type-string field--label-hidden">DATA SILOS: THE ACHILLES HEEL OF LARGE ORGANIZATIONS</span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span> <span class="field field--name-created field--type-created field--label-hidden">Fri, 09/17/2021 - 00:55</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>DATA SILOS: THE ACHILLES HEEL OF LARGE ORGANIZATIONS</p> <p>Quick question: how do you determine strategic choices for your business?</p> <p>While qualitative decision making is a viable approach, you also want to have an understanding of the underlying numbers to help you make the most informed and optimal choice.</p> <p>In the modern age, companies are unable to come up with decisions without relying on accurately collected data. The biggest obstacle to using precise data analysis is not even the technology or skills required. It is accessibility or rather <strong>inaccessibility to the data</strong>. In fact, <a href="https://www.talkwalker.com/blog/remove-data-silos-in-your-organisation-using-conversational-intelligence">47%</a> say data siloing and accessibility are the biggest challenges to gaining insight from marketing data. </p> <p>May companies still suffer from data silos, which are a big blocker for good decision making. Read on to learn how to break down data silos in your company. </p> <h2>What Exactly Are Data Silos? </h2> <p>A data silo, also known as an information silo, is a repository of information in an organization’s department that remains under the control of the said department but is isolated from the rest of the organization. </p> <p>Much like grain in a farm is sheltered from the elements, a business silo is a mentality or culture that prevents team members from one department from sharing data and information, developments or even technologies with members from other departments.</p> <h3>Are data silos a natural way of doing business?</h3> <p>While data silos seem like a normal business practice, they harm your company’s growth. They prevent you from reaching your full potential since data is locked away from general use or generalized access, thus <a href="https://www.clicdata.com/blog/data-driven-business-productivity/">impeding productivity</a>. </p> <p>Going back to the physical silo analogy, locking up grain in a silo means it is unusable for its intended purpose unless it is moved from the silo into the world. The solution to the big silo problem is finding a way to move the data from isolation. This ensures data integration among all departments in your organization.</p> <h2>Why Do Data Silos Occur? </h2> <p>Isolated islands of information result from a wide variety of factors. Here are the most common scenarios from organizations suffering from data silos:</p> <h3>1. Technological</h3> <p>Without access to proper technology, data cannot pass smoothly between several departments in a company. Specialized applications for fast and convenient data sharing are required. Even so, members from all departments need to be well trained in using the technology.</p> <h3>2. Company Growth</h3> <p>At times, a company may grow to a point that sharing information becomes difficult. It may be due to many departments, offices around the globe, or just too many employees. A company growing too fast may also result in structural issues.</p> <h3>3. Workplace Culture</h3> <p>Unhealthy competition or animosity between teams may lead to those departments withholding information from one another instead of working in unison.</p> <h3>4. Vendor Lock-In</h3> <p>Vendor lock-in is where a vendor wants to keep you within their cloud platform. Companies should therefore utilize multi-cloud data management instead of relying on only one vendor. </p> <h3>5. Political Reasons</h3> <p>Why do we keep secrets from even some of our friends? It is because the leak of information could result in others misusing that information, even accidentally. The same happens in organizations, but instead of secrets, teams keep information from one another.</p> <h2>The Consequences of Data Silos </h2> <p>While they seem harmless on the surface, data silos are like a hidden iceberg waiting to sink your company.</p> <p>Data silos exist in virtually all companies. They tend to occur, especially in large organizations with several departments that have different priorities, goals, and responsibilities. A common factor lies within the very core of our organizations, the company’s structure. </p> <p>Several departments within an organization are necessary for heightened focus and effectiveness. However, if the boundary between departments gets too thick, the free flow of information is soon impeded. It may result in:</p> <h3>1. Slowing down your business</h3> <p>We live in a fast-paced environment where we have to make informed decisions quickly. Waiting for information to be gathered from all departments then processed will have you lagging behind your competition.</p> <h3>2. Negatively impacting crucial areas of your business</h3> <p><a href="https://www.clicdata.com/blog/why-spreadsheets-arent-enough/">Inaccurate spreadsheets</a> hamper your financial planning and budgeting. Poorly organized data will overburden your IT staff. Incomplete performance management data from your HR guys will make it hard to recognize and reward the top employees or even notice if there’s poor team culture. You can remedy this by actively tracking contributions in real time using the right <a href="https://rickywang.com/best-performance-management-software">performance management software</a>. </p> <h3>3. Limited communication and collaboration </h3> <p>The existence of data silos within a company means that your employees have limited access to information. Therefore, they cannot utilize the full value of the collected data. They also miss out on the chance to work together with a similar goal in mind.</p> <h3>4. A decrease in the quality and credibility of data</h3> <p>Of course, one of the biggest setbacks of data silos is this. Isolated data is useless, as it quickly becomes outdated or inaccurate. Such data can majorly affect the sampling or analysis of your company’s progress, rendering the findings inaccurate.</p> <h3>5. Storage space and accuracy</h3> <p>In the absence of a central system of data storage, employees who need quick access to data will save copies for themselves. Ten guys storing copies of the same information eats up your storage budget while still making it hard to figure out whose version is the latest.</p> <h2>Five Ways to Break Down Data Silos </h2> <p>To get the most out of your data, you need to access and analyze it in a fluid manner. That means getting rid of the data silos that, to be honest, most of the people working in your organization are not even aware of.</p> <p>Now, how do you do this?</p> <p>After that much pessimistic talk about data silos, you will be pleased to know that knocking down data silos is not as hard as many business owners think. </p> <p>Here are several strategies that organizations use to eliminate data silos:</p> <h3>1. Encourage communication and collaboration</h3> <p>Most, if not all data silos, are created due to a lack of communication between various departments in an organization. For example, your marketing guys may be storing data on Google Drive while their colleagues in the sales department might be using Dropbox. This way, both teams may be unaware of where the other is storing their files. All this is while they are supposed to be working together in the first place. </p> <p>You can get rid of such isolated data by encouraging your employees to collaborate in cross-functional teams. Using a good project management software is key to ensuring this.</p> <h3>2. Unify technology platforms</h3> <p>Nowadays, you would be surprised at just <a href="https://www.clicdata.com/blog/so-much-saas-but-no-so-much-collaboration/">how many technologies are in use</a> within a single department, let alone a whole organization. A department may have more than five databases in use, with many applications spread across them. It leads to an endless tangle of data silos due to incompatible technology stacks that can’t be integrated.</p> <p>Simplifying and consolidating your data management systems to a unified central system will work wonders in eliminating data silos. Start by consulting your employees on their data collection methods. Afterward, decide on which systems should be merged or eliminated for a more efficient solution.</p> <p>A unified view of your data also allows you to conduct more thorough customer analysis, this is especially true if you are running an online business such as an <a href="https://yourlifestylebusiness.com/saas-business/">SaaS business </a>where quality and quantity of customer data are paramount to success and strategic decision making. </p> <h3>3. Sift through outdated data</h3> <p>Yes, to create a <a href="https://www.clicdata.com/product/data/etl/">data management solution</a> that is usable, you have to go through months of outdated, isolated data. It is to ensure that the data you have is updated and accurate. </p> <p>An easy way to do this would be having your entire company work together through it. While you will be able to weed out any unusable data, it doubles up as a team-building activity. You can then ensure your employees understand how to handle data and the data storage platforms which are approved.</p> <p>For example, having all your sales data centralised and kept up to date makes it easier for sales managers to monitor each sales rep activity.. You can even use these <a href="https://www.clicdata.com/blog/the-great-top-sales-kpis-debate/">sales KPIs</a> as a starting point.</p> <h3>4. Move data into a central repository</h3> <p>How about moving all the data to a repository such as a <a href="https://www.clicdata.com/product/data/warehouse/">data warehouse</a> or a data lake? It is an ingenious way to exterminate data silos since all your employees will be able to access the data whenever they need it. </p> <p>The extract, transform, and load process is the most common method of compiling multiple sources of data into a single target database. You can learn more about the ETL process <a href="https://www.guru99.com/etl-extract-load-process.html">here</a>.</p> <h3>5. Avoid vendor lock-in</h3> <p>Vendor lock-in is a serious problem facing organizations that store data in the public cloud. Vendor lock-in makes migrating data from one platform to another both expensive and time-consuming. Vendors can also use golden handcuffs to incentivize you not to leave their program at all. </p> <p>To avoid such data silos, you need to use a flexible solution that gives you data autonomy or the ability to transfer your data out of the cloud whenever you want. You can even use a <a href="https://blog.cloudera.com/why-your-enterprise-needs-a-hybrid-cloud-strategy/">hybrid cloud strategy</a> to remedy this.</p> <h2>Conclusion </h2> <p>As organizations grow, data silos increasingly become a significant threat. Many companies cannot reach their full potential because of restricted access to their own data. The good news is that there are many techniques you can use to get rid of this real headache.</p> <p>Preventing data silos will allow your organization to <a href="https://www.clicdata.com/blog/data-driven-sales-teams/">share data among teams to help drive sales</a>. When you take steps to deconstruct your data silos, you will witness results almost immediately.</p> <p>Via clicdata</p> </div> <div class="field field--name-field-blog-category field--type-entity-reference field--label-inline clearfix"> <div class="field__label">Category</div> <div class="field__item"><a href="/taxonomy/term/9" hreflang="en">Cloud – big data</a></div> </div> <div class="field field--name-field-tags field--type-entity-reference field--label-inline clearfix"> <h3 class="field__label inline">Tags</h3> <ul class="links field__items"> <li><a href="/taxonomy/term/229" hreflang="en">Data Strategy</a></li> </ul> </div> <section class="field field--name-comment field--type-comment field--label-above comment-wrapper"> </section> Thu, 16 Sep 2021 17:55:15 +0000 admin 626 at https://tigosoftware.com https://tigosoftware.com/data-silos-achilles-heel-large-organizations#comments